A perceptual study of acceleration parameters in HMM-based TTS
نویسندگان
چکیده
In HMM-based TTS, statistical models of static, velocity (delta), and acceleration (delta-delta) parameters are jointly trained in a unified, ML-based framework. Previous study has shown that the acceleration parameters are able to generate smoother trajectory with less distortions, but the effect has never been investigated in formal objective and subjective tests. In this paper, the effect of the acceleration parameters, in addition to their static and velocity counterparts, in trajectory generation is studied in depth. We show that discarding acceleration parameters only introduces small additional distortion compared to the reference generated with full model parameters. But human subjects can easily perceive the voice quality degradation, because saw-tooth-like trajectories are commonly generated. Several methods to alleviate the discontinuity are discussed, and we choose the upperand lower-bounded envelopes of the saw-tooth trajectories for further analysis. Experimental results show that both envelope trajectories have larger objective distortions than the saw-tooth ones. However, the speech synthesized using the envelope trajectory becomes perceptually transparent to the reference. This study, in addition to its subjective and objective significance in measuring the distortion of the synthesized speech, facilitates efficient implementation of low-cost TTS systems, as well as low bit rate speech coding and reconstruction.
منابع مشابه
XIMERA: a new TTS from ATR based on corpus-based technologies
This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely ν-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese fema...
متن کاملHMM-based TTS for hanoi vietnamese: issues in design and evaluation
This paper presents the development and evaluation of an HMM-based TTS system for the modern Hanoi dialect of Northern Vietnamese, a tonal language. A study of specific phonetic and prosodic features of Hanoi Vietnamese is discussed. Consequences on the design of an HMM-based TTS system are derived. Using this knowledge, a TTS system, called VTed, is then developed under the Mary TTS platform. ...
متن کاملAdvances in Spectral Parameterization for Statistical (HMM-Based) TTS
HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...
متن کاملLinguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements...
متن کاملSyllable HMM based Mandarin TTS and comparison with concatenative TTS
This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...
متن کامل